-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unused token_type_ids in MPNet #9564
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @LysandreJik But I don't think this is the way to go. We should just silently remove the token_type_ids
from the tokenizer and the model args at the same time. I don't think this warrants a deprecation cycle since it's not used.
I'm totally fine to definitely suppress this argument in once if this is prefered (I would prefer as well) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to put warnings however, as @sgugger said I would simply remove these from both the tokenizer and the forward call. Their presence is a bug, and not a feature we're modifying.
@@ -582,6 +578,7 @@ def call( | |||
training=training, | |||
kwargs_call=kwargs, | |||
) | |||
print(inputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adapting @jplu!
Not an expert on the tokenization part, but is the method build_inputs_with_special_tokens
still necessary in that case? (It's in both tokenizers files.)
Not an expert neither. Maybe @LysandreJik knows better. |
I think it's still required as it puts e.g. the [sep] token correctly between two sentences. I don't think that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, LGTM!
Co-authored-by: Lysandre Debut <[email protected]>
What does this PR do?
This PR adds a warning when the argument
token_type_ids
is given a show a message saying that this argument is never used. I just supressed the internal appearance of this argument without modifying the method signatures in order to do not integrates a breaking change.Should I update the Tokenizer to make it returns only
attention_mask
?